The original “2016 New Coder Survey” dataset consists of 113 variables. Most of these variables are answers to survey questions, though a few are computer-generated (e.g. respondent ID and survey start/end times). Over 15,000 observations (i.e. respondents) exist.
The str function output is long and messy, so I won’t print it here. Please consult Free Code Camp’s list of survey questions and possible answers. Boolean, numeric, and categorical types are the majority.
I created six new variables from existing variables:
ifelse statementscut function on HoursLearningThese new variables bring our total to 119 variables.
## [1] 15620 119
646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?”
## [1] 646 119
Additional comments are included where the results significantly differ from the full new coder survey dataset.
The univariate section intentionally mimics the structure of Free Code Camp’s Medium article for a direct comparison of data science/engineering students and new coders in general. A few additional univariate plots are included to smooth the transition to the plots explored in the bivariate and multivariate sections.
CodeNewbie and Free Code Camp designed the survey, and dozens of coding-related organizations publicized it to their members.
Of the 646 developing data scientists and data engineers who responded to the survey:
## female
## 0.2447917
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 14.00 22.00 26.00 27.72 31.25 65.00 74
This average is 5 months longer than the full survey dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 8.00 16.17 20.00 360.00 31
The median programming experience of 8 months is much clearer after logarithmically transforming the long tail data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 14.41 20.00 80.00 30
Compared to 40% for the full new coder survey, this is a bit shocking. I understand the demand for data scientists and engineers in industry, but I have a hunch these zero counts are caused by the survey’s design. Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”
The data-related subset has a longer time horizon than the full survey dataset, where 65% are applying within the next year.
The developing data scientists/engineers use Coursera, edX, and Udacity more frequently than new coders in general. These companies have wider subject area scopes than the some of the coding-specific resources listed.
6% of new coders from the full survey dataset have attended a bootcamp.
The dominating percentage of North Americans should be expected because Free Code Camp is based in the United States.
Compared to 58% for the full new coder survey, the data-focused subset is more skewed towards post-secondary studies.
Diversity amongst majors is greater compared to the full survey, where Computer Science and Information Technology checked in at #1 and #2 with 17% and 5%, respectively.
Two-thirds of new coders, in general, are currently working.
Employment fields are more spread compared to the full new coder survey, where 50% of respondents work in software development and IT.
The median current salary for the full dataset is $37k.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 25000 43600 48420 60000 200000 390
The median for the full survey dataset is $50k. With data science/engineering being notoriously lucrative in 2016, some respondents might be seeking higher wages.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 60000 61110 80000 200000 65
## has served in military
## 0.06501548
## has children
## 0.1346749
## financially supporting
## 0.03250774
## no spouse
## 0.2137405
## is underemployed
## 0.4705882
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 76000 150000 194400 240000 1000000 591
This average is $3k more than the full survey dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 10000 20000 36880 45000 1000000 485
Removing the million dollar outlier, the distribution is much clearer with the majority of debt under $75k. I hope that outlier is a joke.
## has high-speed internet
## 0.8573913
## is receiving disability benefits
## 0.02608696
There isn’t really a singular main feature of interest in the “2016 New Coder Survey” dataset. There are several smaller features, but nothing stands out like diamond price and its relationship to carat weight, cut, colour, etc. in the R diamonds dataset, for example. The diamonds dataset covers two time periods (the existence of the diamond pre-sale and post-sale), whereas the survey dataset only covers a single period (the early stages of an individual’s coding career).
If we could fast-forward several years and survey the same respondents, the main feature of interest might be career earnings (adjusted for cost of living, preferably) and/or self-reported career satisfaction. A predictive model using a combination of variables from the 2016 survey could then be built to estimate career success.
If the survey asked “Are you already working as a data scientist/engineer?” instead of “Are you already working as a software developer?”, the current income variable might be a main feature of interest. Unfortunately, the answer to that question cannot be extracted from the existing variables.
Though there isn’t a main feature of interest, we can separate the respondents who did not answer “Data Scientist/Data Engineer” to the job role interest question (as we already have for those who did) and compare the two subsets using bivariate and multivariate plots.
I will also explore two smaller features, hours dedicated to learning per week and expected next salary, using bivariate and multivariate plots.
There is a lot of long tail data. Most did not require transformation to view the details of the distribution. Programming experience is really positively skewed, however, and required log transformation to visually compare those with 3 months experience to those with 25 years.
That no respondents want to freelance or start their own business seems strange. Perhaps a survey design choice caused these zero counts.
The following operations were performed to tidy, adjust, or change the form of the data:
gather() to transform the data from a wide format to a long format. Then I transformed the long data into factor format, using the replicate function with the number of yeses as the multiplier. This data is used to create the code event, resource, and podcast bar charts. The first five operations were performed so bar charts could be created, which wasn’t possible with the original data format. The Americas separation was performed for additional insight.
14974 respondents did not answer “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?”
## [1] 14974 119
The next two plots are created using pairs.panels() from the psych package. They display a scatter plot of matrices (SPLOM), with bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.
For the data science subset of the survey, all correlations are below 0.4, which supports my statement that no main feature exists. The strongest of the correlations are:
The phenomena revealed are intuitive, but not groundbreaking: you tend to make more money when you are older, you tend to expect your next job to have a high salary if your current one does, and expensive schooling tends to lead to higher income levels.
For the non-data science subset of the survey, all correlations are again below 0.4. Most of the correlations are within 0.1 of the data science subset, except for three:
Interesting. Student debt levels are involved in all three correlations. I bet the aforementioned skew towards post-secondary studies for the data science subset plays a role here, where higher levels of student debt are expected.
Let’s zoom in on the strong age-income correlation, this time for the full survey dataset. Note that the strength exists despite the majority of $200k salaries belonging to respondents under 40.
The earnings vs. age trend, however, isn’t maintained as these individuals prepare to transition to their new field of choice. Younger individuals appear willing to capitalize on lucrative tech salaries and older individuals appear willing to take a pay cut.
Let’s use the full new coder survey for the rest of the analysis. We’ll explore hours dedicated to learning per week and expected next salary. These are variables dependent upon the quality of coding resources, whereas the other numerical ones (e.g. age, income, and programming experience) are set previously.
For the following boxplots, the horizontal line is the median and the “x” is the mean. The top of the box is the third quartile and the bottom is the first quartile. Whisker length is the interquartile range multiplied by 1.5.
Hours dedicated to learning results are nearly identical across genders. Do trans new coders spend more time learning? A small sample size issue exists, but I wouldn’t be shocked if a true effect is present here.
##
## male female genderqueer agender trans
## 10766 2840 66 38 36
Not much differentiation for continents as well. All have a median of 10 hours dedicated to learning per week. Asian and African students have the highest means, at 16.4 and 16.8 hours, respectively.
##
## North America Europe Asia South America Africa
## 6744 3358 2178 567 506
## Oceania
## 301
Females actually expect higher salaries than males, with a $9k gap in medians and a $4k gap in means. There is a huge gap in first quartiles, where the 25th percentile female expects $14k more than her male equivalent. As with hours dedicated to learning, transgender new coders have relatively higher expected salaries. Did a particularly ambitious set of trans individuals respond to the survey or are these their true traits?
## Gender: male
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 52620 70000 200000 6763
## --------------------------------------------------------
## Gender: female
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 43650 59000 56620 70000 200000 1532
## --------------------------------------------------------
## Gender: genderqueer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 7000 50000 60000 66970 70000 200000 37
## --------------------------------------------------------
## Gender: agender
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 24000 36000 46500 58220 67500 200000 20
## --------------------------------------------------------
## Gender: trans
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 20000 44250 67500 67230 76250 200000 20
Whoa. Expected earning by continent varies way more compared to the above three boxplots. North Americans expect the highest range of salaries, with their interquartile range spanning from $50k to $70k. Europe’s 75th percentile is North America’s 25th percentile. I wonder if some European respondents forgot to convert from pounds or euros to US dollars. Expectations in Asia are all over the board.
A lot of these individuals are using similar, if not the same, online educational resources. Labour market economics are cruel.
## ContinentCitizen: North America
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 50000 60000 61820 70000 200000 3700
## --------------------------------------------------------
## ContinentCitizen: Europe
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 15000 30000 36010 50000 200000 2178
## --------------------------------------------------------
## ContinentCitizen: Asia
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 18000 48000 51050 70000 200000 1470
## --------------------------------------------------------
## ContinentCitizen: South America
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 18000 36000 40300 60000 200000 414
## --------------------------------------------------------
## ContinentCitizen: Africa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 45000 52290 70000 200000 361
## --------------------------------------------------------
## ContinentCitizen: Oceania
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10000 40000 50000 54810 60000 200000 178
The median respondent that dedicates 40+ hours per week expects $10k more than the median respondents from the other brackets.
##
## (0,10] (10,20] (20,40] (40,80]
## 8175 3564 2394 636
Let’s dig into that 40-80 hour bracket. Less than 5% of respondents are dedicating 40+ hours to learning each week. Below are the most common ages (the top row is age and the bottom row is number of respondents) and educational backgrounds for this bracket.
##
## 25 21 26 23 24 20 22 32 30 27
## 40 38 36 35 33 29 29 28 26 24
##
## bachelor's degree
## 249
## some college credit, no degree
## 94
## high school diploma or equivalent (GED)
## 65
## master's degree (non-professional)
## 43
## some high school
## 28
## professional degree (MBA, MD, JD, etc.)
## 24
Most of these respondents are in their early twenties and have a bachelor’s degree. It appears that they are forgoing traditional forms of higher education like master’s and professional degrees and using those 40+ hour weeks to learn code.
This is the exact situation I’m in with my personalized data science master’s degree. The quality and affordability of online education in 2016 is incredible, though many still aren’t aware of the existence of resources like Free Code Camp, Udacity, and Coursera. If this survey was performed in a few years, I would expect more respondents to be in the higher brackets.
Again, the most common job roles of interest are:
##
## Full-Stack Web Developer Front-End Web Developer
## 2571 1379
## Back-End Web Developer Data Scientist / Data Engineer
## 704 646
## Mobile Developer User Experience Designer
## 414 275
## DevOps / SysAdmin Product Manager
## 219 191
## Quality Assurance Engineer
## 104
User Experience Designer is by far the most diverse discipline in terms of gender, with about the same amount of males as females and the highest percentage of agender, genderqueer, and trans respondents. Mobile development is the most male-dominated discipline near 80%, though full-stack and back-end development are close.
The highest relative popularity for North America (read: biggest purple bar segment) is user experience design. Europe’s is back-end development. Asia’s, South America’s, and Africa’s is mobile development. Oceania’s is data science/engineering. Mobile developer is the most diverse discipline in terms of citizenship.
The skew towards post-secondary studies for data science and data engineering is much clearer here. Mobile development has the highest percentage of respondents with no, some, or only a high school education. This skew will surely reflect itself in the subsequent age boxplot.
Mobile developers are indeed the youngest with a first quartile of 20 years, two years younger than the next youngest discipline. The remaining disciplines are fairly close in age, with front-end development being the oldest with a mean age of 29 years.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 11.00 23.00 27.00 28.94 33.00 70.00 294
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.00 24.00 27.00 29.08 33.00 64.00 193
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 13.00 22.00 27.00 28.03 32.00 59.00 103
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 14.00 22.00 26.00 27.72 31.25 65.00 74
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.0 20.0 24.0 26.2 31.0 54.0 77
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.00 22.00 26.00 28.74 32.00 73.00 38
Data scientists-, data engineers-, and back-end developers-in-training have programmed the longest with a median experience of 8 months. UX designers have the lowest first quartile by two whole months at two months of programming experience.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.250 0.500 1.043 1.000 40.830 88
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2500 0.5000 0.7917 1.0000 15.0000 43
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.3333 0.6667 1.2680 1.6250 20.0000 33
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2500 0.6667 1.3470 1.6670 30.0000 31
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.250 0.500 1.049 1.250 13.330 15
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.1667 0.5000 1.0610 1.0000 36.0000 20
Full-stack developers dedicate the most time to learning each week, with 25% of respondents dedicating 30+ hours weekly. UX designers spend the least amount of time learning per week with a mean of 12 hours per week.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 10.00 15.00 19.94 30.00 100.00 108
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 6.0 12.0 16.7 20.0 100.0 48
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 8.00 15.00 18.77 25.00 100.00 40
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 14.41 20.00 80.00 30
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 12.00 17.76 25.00 100.00 21
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 12.04 15.00 63.00 19
Respondents interested in data science and/or engineering clearly have the highest current salaries. Their third quartile of $60k per year is $8k higher than the next highest discipline. There isn’t much income differentiation between the remaining job roles of interest.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 35000 41010 52000 200000 1508
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 35000 37020 48000 200000 806
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 17750 32000 36990 49250 200000 436
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 25000 43600 48420 60000 200000 390
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 33800 36420 46500 155000 286
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 31500 35730 50000 90000 175
Respondents interested in data science/engineering expect to earn the most at their next job. Given the aforementioned correlation between current salaries and expected salaries, this is not a surprise. Note that expected salaries are higher than current salaries (see the previous boxplot) across the board.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 55000 54670 70000 200000 225
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 48070 60000 200000 118
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 50060 65000 200000 73
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 60000 61110 80000 200000 65
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 52740 70000 200000 48
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 51000 55100 70000 200000 40
The data science/engineering subset of the survey is largely similar to the non-data science/engineering subset, except for three correlations involving student debt owed. The skew towards post-secondary studies for the data-focused subset is the likely culprit.
The correlation between current salary and age is stronger than expected next salary and age for the respondent’s first data science/engineering job and age.
Hours dedicated to learning per week doesn’t appear to vary much with gender or continent, though sample size issues exist.
Expected salary for a respondent’s next job varies strongly by continent. Females also appear to have a much higher bottom line for expected salary than males. Those who dedicate more than 40 hours a week to learning expect higher salaries as well.
The majority of respondents for all job roles of interest are male, North American, and have bachelor’s degrees. Age, programming experience, hours dedicated to learning, current salary, and expected next salary all vary depending on job role of interest. One or two of the disciplines stands out from the pack for each of the five quantitative variables.
No exceedingly strong relationship exists. All correlations are below 0.4.
Current salary and expected next salary has the strongest relationship for both subsets with correlations of 0.36 and 0.38.
Europe’s 75th percentile for expected next salary is North America’s 25th percentile ($50k USD). Perhaps some European respondents forgot to convert from pounds or euros to US dollars.
Let’s dig deeper into the strongest correlation: current salary versus expected next salary. Again, this new job is presumably where the respondent will put their new coding skills to use.
Plotting income vs. expected earning across genders, the first impression is that there are a lot of male data points. This abundance makes it hard to tell if the wage gap presents itself in this dataset. Looking at each gender’s presence above the $50k lines, my first instinct is that the gap exists. Males definitely have the highest proportion of $150k+ salaries. Stay tuned for the final plots section, where I’ll determine the presence of the gap definitively.
The same multivariate plot is generated below, but for the question “Are you an ethnic minority in your country?” Again, it is difficult to determine definitively if a wage gap is present. Non-ethnic minorities definitely have a higher proportion of $150k+ salaries, but a huge amount of data points are clustered in the bottom left quadrant. It looks like minorities are better represented above the $50k expected salary line, but not as much for the $50k current salary line. We’ll see for sure if this is indeed true in the final plots section.
Let’s combine all of the purple/pink boxplots from the bivariate plots section into one radar chart. The mean for each numerical variable normalized between 0 and 1 is plotted for each job role of interest.
One thing jumps out immediately: developing data scientists/engineers lead the pack for programming experience, current salary, and expected next salary. Beyond that, however, overplotting is again an issue, which makes it difficult to internalize other patterns in the data. I’ll fix that for all three multivariate plots in the final plots section.
Males and ethnic majorities dominate the $150k+ salary range, but an overall wage gap between genders and ethnicities for this dataset is not definitive when all the data points are plotted together. I thought the gaps would instantly be clear. More exploration needs to be done, and more will be done in the next section.
For males vs. females, the density overlay tells us that the wage gap, i.e. males earn more than females, actually does not exist in this dataset. Though males do have the highest proportion of elite ($150k+) current and expected next salaries, it appears that a similar proportion of males and females are in the top right quadrant above both $50k lines. Females expect higher next salaries, while current salaries are similar.
An ethnicity-based wage gap does not exist as well, based on the density overlay for the second plot. Current salary densities are nearly identical. Ethnic minorities appear optimistic about the changing diversity landscape via their notable presence in the top left quadrant. This quadrant is where current salaries are below $50k, but expected next salaries are above $50k.
Higher dispersion exists for the majority demographic in both cases, with notable densities near the origin. The relationship between expected and current salary is much stronger for the minority demographic.
Perhaps new coders aren’t reflective of the working population in general, where data suggests that a racial and gender wage gap still exists in 2016.
The dataset’s two main categorical variables - gender and citizenship by continent - convey the basic demographics of each job role of interest.
The majority of survey respondents are males and North Americans. Mobile development has the highest percentage of males. User experience design has the highest percentage of North Americans.
User experience designer is the most diverse role in terms of gender, with about the same amount of males as females and the highest percentage of agender, genderqueer, and trans respondents.
Mobile developer is the most diverse role in terms of citizenship, with the lowest percentage of North Americans and relatively high percentages of Asians, South Americans, and Africans.
This faceted radar chart, where the normalized mean (between 0 and 1) for each numerical variable is plotted for each job role of interest, clarifies the differences between disciplines.
Developing data scientists/engineers make the most money, expect the most money for their next job, and have the most programming experience. They have the largest amount of area within their polygon.
Full-stack developers are relatively older and dedicate the most amount of time to learning weekly. They also have a large polygon area.
Front-end developers are green in terms of programming experience and have the lowest salary expectations for their first job where they advertise their new web development skills. They also have relatively low current salaries. These three factors contribute to the smallest polygon area.
Mobile developers are the youngest and currently do not make much money. These characteristics are expected of the discipline with the highest proportion of respondents with no, some, or only a high school education. They have the second smallest polygon area.
Developing data scientists and engineers are slightly different than new coders in general.
The two datasets do share plenty of common trends. Demographics are similar. Most are willing to relocate. Most don’t use podcasts or attend events yet.
Older new coders are willing to take a pay cut when transitioning to a job where they advertise their new coding skills. Younger new coders intend to increase their earning potential by capitalizing on demand for coding.
Weekly hours dedicated to learning doesn’t differ much across genders and citizenships by continent. Next expected salary does, however. Most people aren’t replacing the traditional college/university route with full-time online education…yet. Those that are expect higher salaries.
Gender and continent distributions across job roles of interest vary. Females appear drawn to user experience design. Asians, South Americans, and Africans appear drawn to mobile development. School degree obtained does not vary much by discipline overall, though data science/engineering and mobile development stick out as the most and least seasoned in terms of education, respectively.
Developing data/scientists have the highest current salaries, expect the highest next salaries, and have the most programming experience. Front-end developers are the oldest. Full-stack developers dedicate the most amount of time to learning per week.
Mobile developers are the youngest and have the lowest current salaries. Front-end developers are the least experienced coders and expect the lowest next salaries. UX designers spend the least amount of hours learning weekly.
The gender and racial wage gaps do not present themselves in this dataset. Perhaps new coders aren’t reflective of the working population in general.
The successes of this exploration are largely due to the detailed design of the Free Code Camp survey.
The main struggle I encountered in this exploration was the lack of a main feature of interest, like the diamond dataset’s price variable. It would be awesome if we could survey the same respondents in a decade or so. We could combine career earnings and career satisfaction with the 2016 survey’s results to build a predictive model to estimate career success.
These are the people who are learning data science and engineering. It is clear that free, self-paced learning resources are important.